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Abstract 

Oysters, with high levels of phenotypic plasticity and wide geographic distribution, are a challenging group for taxonomists 
and phylogenetics. Our study is intended to generate new EST-SNP markers and to evaluate their potential for cross-species 
utilization in phylogenetic study of the genus Crassostrea. In the study, 57 novel SNPs were developed from an EST database 
of C. gigas by the HRM (high-resolution melting) method. Transferability of 377 SNPs developed for C. gigas was examined 
on four other Crassostrea species: C. sikamea, C. angulata, C. hongkongensis and C. ariakensis. Among the 377 primer pairs 
tested, 31 1 (82.5%) primers showed amplification in C. sikamea, 353 (93.6%) in C. angulata, 254 (67.4%) in C. hongkongensis 
and 253 (67.1%) in C. ariakensis. A total of 214 SNPs were found to be transferable to all four species. Phylogenetic analyses 
showed that C. hongkongensis was a sister species of C. ariakensis and that this clade was sister to the clade containing C. 
sikamea, C. angulata and C. gigas. Within this clade, C. gigas and C. angulata had the closest relationship, with C. sikamea 
being the sister group. In addition, we detected eight SNPs as potentially being under selection by two outlier tests (fdist 
and hierarchical methods). The SNPs studied here should be useful for genetic diversity, comparative mapping and 
phylogenetic studies across species in Crassostrea and the candidate outlier SNPs are worth exploring in more detail 
regarding association genetics and functional studies. 



Citation: Zhong X, Li Q, Yu H, Kong L (2014) SNP Mining in Crassostrea gigas EST Data: Transferability to Four Other Crassostrea Species, Phylogenetic Inferences 
and Outlier SNPs under Selection. PLoS ONE 9(9): e1 08256. doi:1 0.1 371/joumal.pone.OI 08256 

Editor: Donald James Colgan, Australian Museum, Australia 

Received June 7, 2014; Accepted August 28, 2014; Published September 19, 2014 

Copyright: © 2014 Zhong et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits 
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 

Data Availability: The authors confirm that all data underlying the findings are fully available without restriction. All relevant data are within the paper and its 
Supporting Information files. 

Funding: This study was supported by the grants from 863 Program (2012AA10A405-6), 973 Program (201 0CB1 26406), and National Natural Science Foundation 
of China (31372524). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. 

Competing Interests: The authors have declared that no competing interests exist. 

* Email: qili66@ouc.edu.cn 



Introduction 

Oysters are widely distributed throughout tropical and subtrop- 
ical regions, inhabiting near-shore areas, shallow waters, bays, and 
estuaries [1]. Crassostrea oysters are important commercial species 
and account for most of the world's oyster production. Approx- 
imately 20 species make up the genus Crassostrea, of which C. 
gigas has become the leading species in world shellfish culture 
because of its rapid growth and capacity to adapt to various 
environmental conditions. Besides C. gigas, C. hongkongensis, C. 
ariakensis, C. sikamea and C. angulata are locally important 
species in China, Japan, Korea, the United States and some 
European countries. The rapid growth of the oyster aquaculture 
industry as well as intentional introduction or transplantation of 
oysters pressingly requires an appropriate understanding of the 
genetic variation within and among various oyster species. 
However, conventional taxonomic and phylogenetic studies based 
on morphology and geographic range information have proved 
problematic because of highly plastic shell patterns and overlap- 
ping geographic distributions [2-4] . There are ongoing debates as 
to the species designations in the genus Crassostrea, such as the 
specific status of C. gigas and C. angulata, and the nomenclature 
of C. hongkongensis and C. ariakensis. The ongoing confusion 
about oyster taxonomy and identification has become an 



impediment to further investigation of the genetics and conserva- 
tion of oysters. 

In recent years, relationships and identification of oyster species 
have been investigated by using allozymes, randomly amplified 
polymorphic DNA (RAPD), restriction fragment length polymor- 
phism (RFLP) and DNA sequences such as mitochondrial and 
nuclear genes [5-11]. Particularly, the ability to sequence and 
compare whole mitochondrial genomes provides a new insight 
into phylogenetic relationships of oysters [12-14]. However, 
mtDNA loci are uniparentally inherited and cannot alone 
represent all historical and contemporary processes acting upon 
a population [15]. Moreover, because mtDNA is fast evolving and 
nucleotide mutations may return to an earlier state, its sequences 
may not allow deep phylogenetic reconstruction [12]. Hence, 
incorporating nuclear markers appears necessary to increase 
confidence in determining the relationships of Crassostrea species. 

Single-nucleotide polymorphisms (SNPs) have become corner- 
stone markers for a wide variety of genetic applications because 
they are the most abundant class of polymorphisms in genomes, 
and can be genotyped cost-effectively [16,1 7]. Besides, SNP can be 
found within the genomic sequences of gene candidates for 
artificial or natural selection and therefore they might be more 
informative for evolutionary biology than markers such as 
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microsatellites and AFLPs. They offer a wide range of applications 
such as association studies, high-density linkage maps, traceability 
of genealogies and phylogenetic inference [18,19]. 

The rapid increase in the availability of EST sequences of 
Crassostrea gigas provides abundant resources for obtaining SNP 
markers [20-23]. To date, 320 SNPs have been developed for C. 
gigas by mining expressed sequence tags data, using the HRM 
method [24-26]. Nevertheless, SNP markers for C. hongkongensis, 
C. ariakensis, C. sikamea and C. angulata have not been 
documented. Transferred SNPs from C.gigas provide a valuable 
source of SNP markers for the four species. Such cross-species 
EST-SNPs will be useful for comparative mapping and phyloge- 
netic studies among species in Crassostrea. 

Here, 57 novel SNPs were developed from the NCBI EST 
database (http://www.ncbi.nlm.nih.gov/) of C. gigas and the 
cross-species transferability of 377 SNPs of C. gigas was tested 
among C. hotigkongensis , C. ariakensis, C. sikamea and C. 
angulata. Meanwhile, through the use of the cross-species SNPs, 
we reconstructed the phylogenetic relationships among the five 
Crassostrea species. Moreover, through the use of Fst outlier 
analysis, we identified candidate SNPs that may have been targets 
of selection. 

Materials and Methods 

Ethics Statement 

The field studies did not involve endangered or protected 
species. No specific permissions were required for the locations. 
The locations are not privately-owned or protected in any way. 

Oyster Materials and DNA Extraction 

Thirty-two C. gigas individuals from 2 populations (Popl: 16 
individuals from Weihai, Shandong province, China; Pop2: 16 
individuals from Rizhao, Shandong province, China) were used 
for validation of SNP polymorphisms. Five Crassostrea species 
collected from China were used for the examination of the 
transferability of SNPs, namely C. sikamea (from Nantong, Jiangsu 
Province), C. angulata (from Yueqing, Zhejiang Province), C. 
hongkongensis (from Xiamen, Fujian Province), C. ariakensis 
(from Shantou, Guangdong Province) and C. gigas (from Rushan, 
Shandong Province) (Table 1). A set of species-specific COI 
primers was used for species identification according to the study 
of Wang & Guo [10]. 

DNA was extracted from frozen adductor muscle tissue by a 
modification of the standard phenol-chloroform procedure 



previously described by Li et al. [27] and stored at — 30°C prior 
to genetic analysis. 

Data Mining for SNP Markers 

Sequences containing SNPs were annotated using BLASTx 
software [28], and sequence homology was accepted based on a 
cut-off E value of 1.0x10 6 . The informative strand and reading 
frame were identified by using the sequence with highest 
homology. The NCBI ORF finder (http://www.ncbi.nlm.nih. 
gov/gorf/ gorf.html) was used to determine whether SNPs were 
synonymous, non-synonymous or from untranslated regions 
(UTRs). 

Primer Design and PCR Conditions 

Primers were designed using the Primer Premier 5.0 program 
(PREMIER Biosoft International, Palo Alto, CA, USA). SNP 
markers were developed according to the procedure described by 
Zhong et al. [25] and genotyped using the high resolution melting 
(HRM) method on the LightCycler 480 real-time PCR instrument 
(Roche Diagnostics, Burgess Hill, UK). A total of 46,171 Pacific 
oyster EST sequences were downloaded from GenBank EST 
database (http://www.ncbi.nlm.nih.gov/). The sequences were 
assembled and clustered into contigs with SeqMan Pro software 
(DNASTAR Inc., Madison, WI, USA). A single-base mutation 
that occurred in four or more ESTs and that was surrounded by 
good flanking sequences was identified as a potential SNP for 
further analysis. 

The 10-ul reaction mixture contained 0.25 U Taq DNA 
polymerase (Takara, Dalian, China), 10 x PCR buffer, 0.2 mM 
dNTP mix, 0.2 uM of each primer set, 1.5 mM MgCl 2 , 5 uM 
SYT09 (Invitrogen Foster City, CA, USA) and 10 ng template 
DNA. The concentration of DNA was measured by a Nanodrop 
2000 spectrophotometer (Thermo Scientific, Waltham, MA). The 
PCR cycling conditions included an activation step at 95°C for 
5 minutes followed by 45-50 cycles of 95°C for 20 seconds, a 
touch down of 68°C to 58°C for 20 seconds (0.5°C/cycle) and 
72°C for 20 seconds. Following amplification, the products were 
denatured at 95°C for 1 min, and then annealed at 40°C for 
1 min to randomly form DNA duplexes. Melting curves were 
generated by heating samples from 60°C to 90°C with 25 data 
acquisitions per degree. Data were analyzed using the LightCycler 
480 Gene Scanning Software 1.5 (Roche Diagnostics). 



Table 1. Species included in this study, and the statistics of amplification success and polymorphism. 



Species 


Number of 
individuals 


Sample location (latitude, 
longitude) 


Number 


Percent 


Number 


Percent 








amplified 


amplified 


polymorphic 


polymorphic 


C. sikamea 


20 


Nantong, Jiangsu 
(31.91°N, 121.88'E) 


311 


82.50 


256 


67.90 


C. angulata 


19 


Yueqing, Zhejiang 
(28.15°N, 121.08°E) 


353 


93.60 


306 


81.20 


C. hongkongensis 


19 


Xiamen, Fujian 
(24.43°N,118.15°E) 


254 


67.40 


133 


35.30 


C. ariakensis 


19 


Shantou, Guangdong 
(23.35°N,116.63 C E) 


253 


67.10 


119 


31.60 


C. gigas 


19 


Rushan, Shandong 
(36.90°N, 121.80 E) 


377 


100 


335 


88.90 



doi:10.1371/journal.pone.0108256.t001 
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Table 2. Characterization of 57 polymorphic EST-SNPs derived from Crassostrea gigas. 



SNP name 


Accession 
no. 


Primer sequences (5 -3 ) 


Amplicon 
length (bp) 


SNP type and 
location 


Type 


Annotation 


CgSNP879 


HS1 48847 


F: 


ACTGGTCTCACCCCCATCAC 


60 


C/T (487) 


S (Pro) 


Unknown 






R 


AGTCCTATTCACTTCACTGCTGC 










CgSNP880 


HS1 40594 


F: 


AAGTGGTCATCGAAAAAGGTCTTC 


90 


G/T (632) 


S (Leu) 


Glutathione S-transferase theta-1 






R 


CGGCGAGGTATTTAGACTTCTCC 










CgSNP882 


HS243771 


F: 


TTAGAACCGATAATCCAAGGAAGTC 


76 


A/G (209) 


S (Glu) 


ADP-ribosylation factor-like protein 15 






R 


ACAATCATCTTTACTATTTTCTCTGCC 










CgSNP886 


HS210205 


F: 


TCTGGAAATACAATCTGCTGGC 


71 


C/T (221) 


S (Asp) 


hypothetical protein CGI_1 001 8860 






R 


CCTGGCTTTGATGAGGGCTT 










CgSNP890 


HS236510 


F: 


CGGAGTCGAATGAAACAGGAT 


77 


A/G (112) 


S (Pro) 


Unknown 






R 


TAGGTCTGATACATTGAAGTAAGCG 










CgSNP891 


HS236510 


F: 


TCTACATCGAAGGACAATTTTCAAG 


70 


G/T (250) 


N (Ser-Arg) 


Unknown 






R 


TTCCCGTTTCGGATATACAGACT 










CgSNP895 


FP008693 


F: 


CTCGGTCTCAGTCATTGCGG 


67 


A/G (82) 


S (Met) 


Unknown 






R 


GATTTCTCCTCTATCCTGCTTTCC 










CgSNP900 


HS238336 


F: 


TCCTGATAACATTGCTGTGTTTG 


70 


A/C (166) 


S (Gly) 


Protein BAT5 






R 


GTAGTTCATTGCTACCCATGATGC 










CgSNP909 


CU682103 


F: 


TTACAATTCAGAACAGGACAATGG 


74 


A/T (207) 


S (Leu) 


Macrophage mannose receptor 1 






R 


ACAAACTTTGAGTCTATGACTCGGT 










CgSNP913 


HS167108 


F: 


TGTTGGGAACGATTCATACGG 


77 


C/T (271) 


S (Asp) 


hypothetical protein CGI_1 0025728 






R 


CATTTCGGTGTTCACGATTGG 










CgSNP915 


FQ661219 


F: 


CCAATCCAGTGCCAAAGTCTC 


80 


A/G (317) 


S (Glu) 


Unknown 






R 


CAGCAACTAAATGGTCCACATAAC 










CgSNP917 


HS1 75405 


F: 


TTGTCCTTGTTAATTACTGCATTGC 


70 


C/T (226) 


S (Cys) 


Unknown 






R 


GCCTAGTTTGCGTAGGAGAGAG 










CgSNP924 


HS1 75248 


F: 


GCGGAGTCGGAGCATCAG 


58 


C/T (261) 


S (Cys) 


ELKS/RAB6-interacting/CAST family 
member 1 


R: TCAGGTCGTGGTTCCTCTTCAT 


CgSNP936 


CU993732 


F: 


CACACAAGAAGAAAACGCACAAGAT 


86 


A/G (604) 


S (Glu) 


Phosphatidylinositol-3,4,5-trisphosphate 
3-phosphatase TPTE2 






R 


TGGTAAAAGATGTCAGGAACAAGGT 










CgSNP940 


HS223847 


F: 


ATCACGACTGTAGGGCAGAGATTAT 


81 


G/T (202) 


N (Gln-His) 


Unknown 






R 


AGGTTTGGATTGAGCTTTTGTCTAG 










CgSNP942 


HS191752 


F: 


CCTCGGATCTGTTGATTGCTATT 


72 


A/G (564) 


S (Pro) 


Complement Clq tumor necrosis factor- 
related protein 3 






R 


TGTTCTGCCAGGGTATGTTCG 










CgSNP949 


HS231194 


F: 


CATCTCAGGGAAATGGAAGG 


72 


C/T (491) 


S (Tyr) 


Tetratricopeptide repeat protein 17 






R 


AAGAAACAAAATAATGAAGAGCG 










CgSNP958 


HS1 09673 


F: 


AATCCTTGATGAGCCGACG 


82 


C/T (715) 


S (Ala) 


ATP-binding cassette sub-family F 
member 3 


R: CCCTCCCTGGAATTTCAGTAT 


CgSNP970 


HS206217 


F: 


AAGAGATTTTATTGTAGAAGTTGACATAT 


88 


G/T (125) 


S (Ser) 


Unknown 


R: CATACCAAAAGAATCAATGAATACTC 


CgSNP980 


HS201459 


F: 


AAGACTGTGTGACGGTTCAGATG 


82 


A/G (471) 


S (Ser) 


Unknown 


R: AGCAGTGAAATGTTGGCGAT 


CgSNP989 


HS242001 


F: 


GCAGTGCATGTGGATGAGTAAGT 


81 


G/T (184) 


UTR 


Unknown 


R: CGCCATAAAGTTGAAAGTATTGAAC 


CgSNP990 


HS227296 


F: 


GGTTCCATTAAGCCATCCATTG 


71 


C/T (586) 


UTR 


Unknown 


R: GCAGACAGTATCAGCAGTCGTTG 


CgSNP994 


HS227373 


F: 


TGTATTTCAAGGCGTGTTACAGTG 


84 


C/T (694) 


N (Cys-Arg) 


Unknown 



R: ACTCATCAGTCAAGGGACAACAAG 
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Table 2. Cont. 



SNP name 


Accession 
no. 


Primer sequences (5 -3 ) 


Amplicon 
length (bp) 


SNP type and 
location 


Type 


Annotation 


CgSNP1003 


CU996515 


F: GTGAGAGACTGATGAGTGCCTGT 


72 


C/T (529) 


S (Gly) 


Unknown 


R: TATGAGTGATCAGGAATTCTGTAGC 


CgSNPIOlO 


FQ668992 


F: TCAAATCAAATCTGAACGGCG 


74 


C/T (580) 


S (Gly) 


Fibrinogen C domain-containing protein 
1 


R: CCAGTTATTGTACGGTCCCCAT 


CgSNP1016 


CU997800 


F: ATGTGATTGTCTCTTGAGAATGTGT 


75 


C/T (588) 


N (Val-Ala) 


Unknown 


R: CAGAGATGAAACCAGTATGTCTGAT 


CgSNP1019 


HS1 16482 


F: TCAGACACGGAGGGAAAATG 


98 


C/T (430) 


S (Pro) 


39S ribosomal protein L45, mitochondrial 


R: TCTTTGTCCTCTTTCCAAGTGTG 


CgSNP1021 


HS1 22227 


F: AGCCCACTGGAGGAAGAACC 


58 


C/T (154) 


N (Val-Ala) 


Unknown 


R: GGTATTCGGGATTGAATCTGTG 


CgSNP1023 


HS235875 


F: GCACTACATATCATACCAGACTGTG 


104 


A/G (377) 


N (Asn-Asp) 


Putative arylformamidase 


R: GTTTGTAAAATAATGCCCATAACTG 


CgSNP1024 


FQ666947 


F: GTCTAGGAGTTATTTCCCTTTGATG 


98 


A/C (554) 


UTR 


Unknown 


R: TGGATTTAGTGTTCACCAGTACAAG 


CgSNP1028 


CU997294 


F: ACAGACAAAATGACAAGAAAACAAC 


76 


C/T (606) 


S (Ser) 


Unknown 


R: CAGTGACCTCAGCAGCCATC 


CgSNP1029 


CU997294 


F: CTCTCACACCAGATATTTCCAGCAT 


82 


A/G (630) 


N (Lys-Glu) 


Unknown 


R: CTTCCTTTCAAGGTCACAATCACAC 


CgSNP1034 


HS1 80370 


F: CCTGTCTTTTAACACTGTTTCTGAT 


97 


A/T (322) 


N (Trp-Ser) 


Unknown 


R: GTCAGGACGTTTTCTGCTTTC 


CgSNP1037 


HS248681 


F: CCAAAGTGTACGCTGTAAGGAACC 


81 


G/T (161) 


S (Ser) 


Unknown 


R: CGTCAATGCTGATGGACAAGG 


CgSNP1038 


HS239387 


F: GCTATACCTTGTCCATCAGCATTG 


60 


C/T (708) 


N (Lys-Glu) 


Ufml-specific protease 1 


R: CATTAGTGTTGTTCACGGGGAG 


CgSNP1042 


HS229886 


F: AAGTCAGTGAAGAGCCACAAAC 


84 


A/C (280) 


S (Ser) 


lnterleukin-1 receptor-associated kinase 1 


R: AAACCTCATTAAATCCCAAGTGT 


CgSNP1043 


FP004709 


F: CAAGTTCCGAATGAAATACCTTCT 


85 


C/T (554) 


N (Tyr-Cys) 


hypothetical protein CGI_1 0008375 


R: CTCAAAATAGCTGTCCCTGTGTG 


CgSNP1045 


FP004709 


F: GACAGATAACAACTCTCAAGCAAAC 


67 


A/C (688) 


S (Leu) 


hypothetical protein CGI_1 0008375 


R: CACATATCGTTACGAAACCGAG 


CgSNP1047 


HS233108 


F: TCTGGAGGCTGTATGCTGAGTT 


65 


A/G (364) 


S (Gin) 


Tetratricopeptide repeat protein 27 


R: CTTTTGTTGTGTTTCCGCTGT 


CgSNP1050 


CU986514 


F: CAAGTGTCCTGTATGTTGACAGTC 


64 


A/G (754) 


N (Met-Val) 


Mu-crystallin-like protein 


R: GATAAAATTACATCCCCACTCTCTT 


CgSNP1052 


HS170919 


F: TCCTGTTGCATCAGTATTCAAGATT 


87 


A/T (233) 


N (Leu-Ter) 


Unknown 


R: AAGCCTCAAAGTATGACCAGCAC 


CgSNP1054 


FP010213 


F: GTAGCTTGGATATTACTGTGAGGC 


77 


G/T (205) 


UTR 


Unknown 


R: CATGGAAATCTCGGTATAAACTTG 


CgSNP1055 


FP010213 


F: GATGAGTGCTTACATCAATCTGAGT 


92 


C/T (371) 


N (Met-Thr) 


Unknown 


R: CAAGACACAAAAACACATGCTTATAC 


CgSNP1056 


HS1 62699 


F: GCTGTTTGGTCTGGTGTTTGT 


79 


A/G (567) 


N (Asn-Asp) 


Unknown 


R: TTGAAAGCATGAAGATTTCTATCAC 


CgSNP1058 


CU997792 


F: AAGGAAATTCCCTGCACAAAC 


78 


A/C (967) 


N (lie-Leu) 


GTP-binding protein GEM 


R: GTCCACACAAGATAAAAGAGAAGAG 


CgSNP1061 


CU986467 


F: CAGAGGACCAGTTTGAGGCTT 


63 


A/G (798) 


S (Val) 


BCCIP-like protein 


R: CTTGTTTGAGTTTGTCTGCGG 


CgSNP1069 


HS1 37887 


F: CGTGGAAATTCTGTGTAAATAGGAC 


76 


C/T (377) 


N (Ser-Gly) 


Unknown 



R: CTTCGGTTCGATTATGCTGC 
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Table 2. Cont. 





SNP name 


Accession 
no. 


Primer sequences (5 -3 ) 


Amplicon 
length (bp) 


SNP type and 
location 


Type 


Annotation 


CgSNPI 073 


HS1 39503 


F: 


GCTGCCAG I I I I I CTCATTCAC 


74 


C/T (523) 


S (Thr) 


1 5-hydroxyprostaglandin dehydrogenase 
[NAD+] 






R 


AACCAAGGACACATACGGACAAC 










CgSNP1074 


HS220139 


F: 


CATGGTGACTAAATCTTCAATGTTGT 


87 


A/G (465) 


S (Ser) 


Exosome component 10 






R 


AAGGCTGTGAGTAGAGGTTTGGC 










CgSNPI 077 


AM853850 


F: 


CTGAGGCACAAAGTCTGGGTAGT 


73 


C/T (356) 


N (Met-Thr) 


Unknown 






R 


GGAGGAGTAGGTGACCGCTTC 










CgSNPI 082 


FP009397 


F: 


TATTAGGACCACATTCAGCTATGTC 


88 


A/G (256) 


S (Pro) 


Unknown 






R 


ATTGATGGGGGTGGAGGTAC 










CgSNPI 105 


FP008773 


F: 


CAAGAGTTGACACCAGAGGGAG 


83 


C/T (285) 


S (Thr) 


Unknown 






R 


CATCAAATACACGATGACCTGAG 










CgSNP1 1 15 


HS204076 


F: 


TCGGTCACTGTTGGATTTCTG 


84 


A/G (378) 


S (Leu) 


Heat shock 70 kDa protein 12B 






R 


GAACAACCCGAATTCACGACC 










CgSNPI 117 


HS1 16629 


F: 


TAGTAAAGGCTAAACAAAGTGTGCT 


71 


G/T (337) 


S (Val) 


Unknown 






R 


AGGGAGAGTCCGAGATGTCAC 










CgSNP1 118 


CU998279 


F: 


GACGAGTGAACGAGTACGGC 


65 


C/T (198) 


S (Tyr) 


Protocadherin-19 






R 


TGGTCTATACGCAGAATAAGGAAT 










CgSNPI 130 


HS142312 


F: 


CAAGGGACAGAGTTCAATGTCTTCT 


86 


A/G (585) 


S (Leu) 


Unknown 






R 


TGACAGGATTTCTTGCATCTTTACC 










CgSNPI 131 


HS225071 


F: 


ATGTGCTTTTTACCCGAACTGC 


63 


A/G (477) 


N (Asp-Asn) 


Poly [ADP-ribose] polymerase 12 






R 


ACCTGTTTTGGTTGCTCGTCTT 











Note: S, synonymous; N, non-synonymous; UTR, untranslated region. 
doi:1 0.1 371 /joumal.pone.01 08256.t002 



Data Analysis 

Shannon's Information index, expected heterozygosity (H e ), 
observed heterozygosity (H„) and Nei's genetic distance [29] were 
calculated using POPGENE 1.32 software [30]. Phylogenetic trees 
were constructed using the neighbor joining (NJ) method 
implemented in MEGA 5.05 and POPTREE2 [31,32]. Bootstrap 
analyses with 1000 replicates were performed to test the support 
for the branches of a phylogenetic tree. 

Arlequin version 3.5.1.3 software was used to calculate pairwise 
Fst between all pairs of species using 1 0000 permutations to test for 
significance (0.01). Oudier SNPs were tested using two island 
models, as implemented in Arlequin. We conducted 50000 
coalescent simulations with 5 demes under a finite island-model. 
The analysis was also performed utilizing a hierarchical island 
model based on 3 groups of 3 demes with 50000 simulations to 
generate the joint distribution of Fst versus heterozygosity. Pre- 
defined population groupings were set as three groups (group 1 : C. 
sikamea, C. angulata and C. gigas; group 2: C. hongkongensis; 
group 3: C. ariakensis) based on the pairwise Fst values. Loci that 
fall out of the 99% confidence intervals of the distribution were 
identified as outliers being putatively under selection. The putative 
function of genes with outlier SNPs was identified using the Gene 
Ontology (GO) annotation by mining the Swiss-Prot database. 

Results 

Development and Transferability of SNPs 

In the study, 262 putative SNPs were selected for validation. 
Among these, 57 SNPs (22%) were polymorphic and considered as 
validated. Information about the panel of loci is summarized in 



Table 2. The 57 substitutions included 41 transitions and 16 
transversions. Of the polymorphic SNPs, 30 (52.6%) could not be 
annotated, 53 (93.0%) were located in the coding region, and 4 
(7.0%) in the UTR. Eighteen of the 53 SNPs located within the 
coding region were nonsynonymous and 35 synonymous. 

A total of 377 SNPs of C. gigas including 320 previously 
developed SNPs [24—26] and 57 new SNPs developed here were 
used to test the transferability in 4 other Crassostrea species: C. 
sikamea, C. angulata, C. hongkongensis and C. ariakensis. The 
basic information obtained with each SNP is shown in Table S 1 . 
Out of the 377 primer pairs tested, 311 (82.5%) primers showed 
amplification in C. sikamea, 353 (93.6%) in C. angulata, 254 
(67.4%) in C. hongkongensis, 253 (67.1%) in C. ariakensis and 377 
(100%) in C. gigas. Using the 377 primer pairs, 256 (67.9%) SNP 
loci were polymorphic in C. sikamea, 306 (81.2%) in C. angulata, 
133 (35.3%) in C. hongkongensis, 119 (31.6%) in C. ariakensis and 
335(88.9%) in C. gigas (Table 1). In total, 214 SNPs could give 
successful amplification in all the five Crassostrea species and 48 
SNPs showed polymorphism in all the five species. 

Phylogenetic Relationships 

A total of 214 SNPs was used for the phylogenetic analysis. 
Information of the 214 SNPs evaluated from the 5 species is shown 
in Table 3. The values of observed heterozygosity (Ho) and 
expected heterozygosity (He) ranged from 0.0792 (C. hongkon- 
gensis) to 0.2895 (C. gigas) and from 0.1026 (C. hongkongensis) to 
0.3229 (C. gigas), respectively. Shannon's Information index and 
the number of polymorphic loci ranged from 0.1664 (C. 
ariakensis) to 0.4749 (C. gigas) and from 99 (C. ariakensis) to 
201 (C. gigas). Nei's genetic distance values ranged from 0.0738 
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Figure 1. Phylogenetic tree of five Crassostrea species using 
neighbor joining (NJ) method based on Nei's genetic distance 
derived from 214 SNPs. Numbers above branches indicate bootstrap 
values from NJ analysis using both MEGA 5.05 and POPTREE2 softwares. 
doi:1 0.1 371 /journal.pone.01 08256.g001 

(C. angulata and C. gigas) to 0.2728 (C. hongkongensis and C. 
gigas) (Table 4). All Fst estimates were statistically significant (P< 
0.01). Pairwise Fst ranged from 0.1230 (C. angulata and C. gigas) 
to 0.5257 (C. hongkongensis and C. ariakensis). The phylogenetic 
tree separated the five species into two clusters (Figure 1). The first 
cluster included two species, C. hongkongensis and C. ariakensis. 
This clade was sister to the clade containing C. sikatnea, C. 
angulata and C. gigas. In this clade, C. gigas and C. angulata had 
the closest relationship, with C. sikamea being the sister group. 
Phylogenetic analysis using the unweighted pair-group method 
with arithmetic mean (UPGMA) generated an identical topology 
with high support values (data not shown). 

Outlier SNPs 

Loci showing higher or lower differentiation with respect to the 
simulated confidence intervals are identified as candidates for 
positive or balancing selection [33]. The Arlequin fdist method 
revealed 10 candidate SNPs (CgSNP28, CgSNP230, CgSNP273, 
CgSNP415, CgSNP420, CgSNP515, CgSNP524, CgSNP544, 
CgSNP669 and CgSNP805) for selection, including 7 for positive 
selection and 3 for balancing selection (Table 5 and Figure 2a). In 
addition, the hierarchical method detected 1 1 oudier loci 
(CgSNP14, CgSNP203, CgSNP803, CgSNP273, CgSNP415, 
CgSNP420, CgSNP515, CgSNP524, CgSNP544, CgSNP669 
and CgSNP805) for selection, including 9 for positive selection 
and 2 for balancing selection (Table 5 and Figure 2b). Both 
approaches revealed 8 SNPs lying outside the 99% confidence 
region of the conditional joint distribution of Fst and heterozy- 
gosity, including 6 for positive selection and 2 for balancing 
selection. Among the 8 SNPs, 5 located within the coding region 
were synonymous and 3 nonsynonymous. The putative function of 



three genes (UPF0686 protein, ankyrin repeat domain-containing 
protein 60, and hypothetical protein CGI_1 00 16494) could not be 
identified using GO searches. The other five proteins (endoglu- 
canase, rho-related GTP-binding protein, flap endonuclease 1-A, 
hypothetical protein CGI_1 0023940 and Chlorophyllase-2) were 
respectively involved in carbohydrate metabolism, GTPase-medi- 
ated signal transduction, DNA repair, DNA binding and 
chlorophyll catabolic process. 

Discussion 

A total of 48769 potential SNPs were detected by mining the C. 
gigas EST database [25]. In our studies, the 1283 putative SNPs 
selected for validation allowed the development of 57 new SNPs 
bringing the total to 377 SNPs that have been validated in this 
species [24—26],. Among the 377 SNPs, 66 SNPs are known to be 
distributed in 8 linkage groups of C. gigas [26]. Compared to the 
use of several (often partial) genes, the adequate number of EST- 
SNPs, distributed in almost all linkage groups of C. gigas, may 
provide more genetic information which is valuable for phyloge- 
netic analyses. The high cross-species transferability of the set of 
377 EST-SNPs of C. gigas tested in four other Crassostrea species 
also suggests their potential utilization in evolutionary analysis 
across taxa of the genus Crassostrea. Moreover, mutations 
resulting in some SNPs can be responsible for an adaptive 
phenotype or the direct target of selection. Studies have shown 
that variation in allele frequencies at some outlier SNP loci can be 
correlated with environmental variables, such as salinity and 
temperature [34,35]. Consequently, die SNP markers offer a 
valuable opportunity to understand the genetic basis of phenotypic 
variation in relation to environmental variation. 

In general, the more evolutionarily distant the taxa, the less 
successful is cross amplification [36,37]. In a previous study, 15 
EST-SSRs developed for C. gigas amplified successfully in at least 
one species, with C. sikamea sharing 14 (93.3%) primer pairs, C. 
hongkongensis 12 (80.0%), and C. ariakensis 11(73.3%) [38]. 
Hedgecock et al. [39] tested 86 genomic SSRs developed for C. 
gigas in cross-species amplification, 83 (96.5%) were likely useful 
for C. angulata, 71 (82.6%) for C. sikamea and 31 (36.0%) for C. 
ariakensis. Our data also showed C. angulata (93.6%) and C. 
sikamea (82.5%) had higher cross-amplification rates than both C. 
hongkongensis (67.4%) and C. ariakensis (67.1%). These results 
suggest that C. gigas has a closer relationship with C. angulata and 
C. sikamea than with C. hongkongensis and C. ariakensis. 

The taxonomy of Crassostrea has been studied for many years, 
but confusions still exist. There is an open debate as to whether C. 
gigas and C. angulata are distinct species [9,13,40,41]. Some 
experts have argued that they are different species but genetically 
closely related [12,40,41], but other phylogenetic analyses suggest 
that the two should be considered one species [9,42]. In our study, 



Table 3. Characterization of 214 polymorphic EST-SNPs evaluated from 5 Crassostrea species. 





Species 


SI 


Ho 


He 


Number polymorphic 


Percent polymorphic 


C. sikamea 


0.361 7±0.2442 


0.1942±0.1792 


0.2394±0.1827 


176 


82.24 


C. angulata 


0.4196±0.2325 


0.2538±0.1924 


0.2829±0.1763 


186 


86.92 


C. hongkongensis 


0.1691 ±0.2040 


0.0792±0.1197 


0.1026±0.1383 


111 


51.87 


C. ariakensis 


0.1664±0.2166 


0.1021 ±0.1627 


0.1039±0.1469 


99 


46.26 


C. gigas 


0.4749±0.2000 


0.2895±0.1860 


0.3229±0.1569 


201 


93.93 



Note: SI, Shannon's Information index; He, expected heterozygosity; Ho, observed heterozygosity. 
doi:1 0.1 371 /joumal.pone.01 08256.W03 
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Table 4. Pairwise Nei's genetic distance 
SNPs. 


(lower diagonal) and Fst values (upper diagona) among 5 Crassostrea species using 214 




Species 


C. sikamea 


C. angulata 


C. hongkongensis 


C. ariakensis 


C. gigas 


C. sikamea 




0.2486 


0.5250 


0.5045 


0.2662 


C. angulata 


0.1327 




0.4827 


0.4824 


0.1230 


C. hongkongensis 


0.2641 


0.2525 




0.5257 


0.4707 


C. ariakensis 


0.2420 


0.2535 


0.1396 




0.4402 


C. gigas 


0.1607 


0.0738 


0.2728 


0.2380 





doi:1 0.1 371 /journal.pone.01 08256.t004 



C. gigas and C. angulata were recovered as separate clades, 
suggesting that C. gigas and C. angulata may be two distinct 
species. However, the low Nei's genetic distance value between C. 
angulata and C. gigas (0.0738) indicates a very close relationship 
between them. Furthermore, C. angulata and C. gigas can cross- 
fertilize without any difficulty in the laboratory and form viable, 
fertile offspring [43-45]. Therefore, we still can not conclude that 
C. gigas and C. angulata are two distinct species. A large amount 
of the two species sampled from a wide geographic range and the 
same locations are required to better resolve this problem. Another 
species, C. hongkongensis has been routinely misidentified as C. 
ariakensis for a long time. In our study, C. hongkongensis and C. 
ariakensis were recovered as separate clades. Moreover, the Nei's 
genetic distance between C. hongkongensis and C. ariakensis 
(0. 1 396) was a little higher than that observed between two closely 
related sister species (between C. angulata and C. sikamea, 
0.1327). The above data suggest that C. hongkongensis and C. 
ariakensis are two distinct species. Yu & Li [14] analyzed the 
complete mitochondrial DNA sequence and determined that C. 
hongkongensis and C. ariakensis are two separate species. Reece et 
al. [9] also suggested that the C. ariakensis sequences formed a 
distinct clade from C. hongkongensis in the COI tree. Therefore, 
we can conclude that C. hongkongensis and C. ariakensis are two 
separate species. 

Identifying the regions of the genome that are shaped by 
adaptation to different environments can be relevant to answering 
several important questions in evolutionary biology. Among many 
selection detection strategies, Fst outlier approaches are becoming 
widely used in identifying genes without known phenotypes that 
are under selection [33,46,47]. These methods can identify 



relatively highly differentiated markers (so-called outlier loci) in 
comparison to expected levels under neutrality inferred from 
coalescent simulations [48,49]. Strong oudier patterns have been 
classically interpreted as being caused by divergent selection 
affecting the loci themselves or genes strongly linked with them 
[50]. Indeed, an alternative explanation for strong genetic 
divergence at some loci exists and is difficult to rule out when 
the tests are being made on comparisons of distinct species. Bierne 
et al. [5 1] advocate the role of pre- or postzygotic genetic barriers 
in genetic divergence. Such endogenous barriers could be the 
consequence of incompatibilities between combinations of alleles, 
established through selective mechanisms that are independent 
from adaptation to habitats [35]. To increase confidence in the 
conclusions reached, two-island models and a high confidence 
level (99%) were used in the Fst outlier analysis. 

Eight loci were identified as being possible targets of selection 
following two Fst outlier tests. Among the 8 SNPs, 5 located within 
the coding region were synonymous and 3 nonsynonymous. While 
nonsynonymous oudier SNPs are particularly interesting due to 
the potential effect of amino acid changes on protein structure and 
function, synonymous SNPs should not be simply dismissed as 
false-positives. This is because natural selection may affect 
synonymous codon usage in some genes, leading to codon usage 
bias [52,53]. Furthermore, there is increasing evidence that silent 
mutations may have functional effects either on translational 
efficiency and accuracy, or on mRNA stability and splicing. 
Another explanation is that they might carry the footprint of 
selection on a beneficial allele that is closely linked to the oudier 
SNP. 





0.3 0.4 0.5 0.6 0.7 

Heterozygosity 



(a) 



(b) 



Figure 2. Plot of Fst against heterozygosity for 21 4 SNPs analysed with the fdist (a) and hierarchical (b) methods. The upper and lower 
lines are the 99% confidence intervals. 
doi:1 0.1 371 /journal.pone.01 08256.g002 
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In marine environments, environmental factors such as 
temperature, salinity, pH and dissolved oxygen often interact in 
complex ways leading to a complicated 'fitness landscape'. In our 
study, C. angulata and C. gigas were sampled from coastal zones, 
whereas C. sikamea, C. hongkongensis and C. ariakensis were 
sampled from estuarine zones. Moreover, the five species were 
collected from 5 sites across 13° of latitude along the coast of 
China. Therefore, water temperature and salinity may be 
environmental variations relevant to fitness. The importance of 
the cytoskeleton in the adaptation to water temperature and 
salinity is well known [54—56]. Major players during cytoskeletal 
remodeling are rho-GTPases, upstream molecular switches 
triggering signaling cascades that target cytoskeletal effector 
proteins to induce morphological change [57]. Another key aspect 
of the cell stress response is modulation of pathways of energy 
metabolism [58]. The data presented here reveal that two genes 
with outlier SNPs (endoglucanase and rho-related GTP-binding 
protein) are involved in carbohydrate metabolism and GTPase- 
mediated signal transduction. Furthermore, the ankyrin repeat 
domain-containing protein 60 may be involved in cytoskeletal 
motility regulation [59], Although the genomic scan provides an 
encouraging result, association genetics and functional studies are 
ultimately required to confirm that particular loci are involved in 
responding to environmental variations. 

In summary, a total of 57 SNPs from EST sequences in C. gigas 
were developed using HRM method. The study confirmed a high 
cross-species transferability of the set of 377 EST-SNPs of C. gigas 
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tested in four other Crassostrea species. Additionally, the current 
study represents an initial attempt at resolving phylogenetic 
relationships in Crassostrea species, using a large collection of 
cross-species SNP markers. The NJ analysis revealed two main 
groups of the five Crassostrea species. The first clade included C. 
hongkongensis and C. ariakensis. C. hongkongensis was a sister 
species of C. ariakensis. This clade was sister to the clade 
containing C. sikamea, C. angulata and C. gigas. C. gigas and C. 
angulata had the closest relationship, with C. sikamea being the 
sister group. Finally, the work, using Fst oudier approaches, 
presented evidence for adaptive genetic divergence in Crassostrea 
species. Further functional studies are needed to confirm the role 
of these outlier loci or genome segments in Crassostrea species. 
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